High-Accuracy Low-Precision Training
نویسندگان
چکیده
Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it. Still, it has been used primarily for inference—not training. Previous low-precision training algorithms suffered from a fundamental tradeoff: as the number of bits of precision is lowered, quantization noise is added to the model, which limits statistical accuracy. To address this issue, we describe a simple low-precision stochastic gradient descent variant called HALP. HALP converges at the same theoretical rate as full-precision algorithms despite the noise introduced by using low precision throughout execution. The key idea is to use SVRG to reduce gradient variance, and to combine this with a novel technique called bit centering to reduce quantization error. We show that on the CPU, HALP can run up to 4× faster than full-precision SVRG and can match its convergence trajectory. We implemented HALP in TensorQuant, and show that it exceeds the validation performance of plain low-precision SGD on two deep learning tasks.
منابع مشابه
WRPN&Apprentice: Methods for Training and Inference using Low-Precision Numerics
Today’s high-performance deep learning architectures involve large models with numerous parameters. Low-precision numerics has emerged as a popular technique to reduce both the compute and memory requirements of these large models. However, lowering precision often leads to accuracy degradation. We describe three software-based schemes whereby one can both train and do efficient inference using...
متن کاملTernary Residual Networks
Sub-8-bit representation of DNNs incur some discernible loss of accuracy despite rigorous (re)training at low-precision. Such loss of accuracy essentially makes them equivalent to a much shallower counterpart, diminishing the power of being deep networks. To address this problem of accuracy drop we introduce the notion of residual networks where we add more low-precision edges to sensitive bran...
متن کاملSynchronous Multi-GPU Deep Learning with Low-Precision Communication: An Experimental Study
Training deep learning models has received tremendous research interest recently. In particular, there has been intensive research on reducing the communication cost of training when using multiple computational devices, through reducing the precision of the underlying data representation. Naturally, such methods induce system trade-offs—lowering communication precision could decrease communica...
متن کاملThe effect of initial teaching on evaluation of left ventricular volumes by cardiovascular magnetic resonance imaging: comparison between complete and intermediate beginners and experienced observers
BACKGROUND High reproducibility and low intra- and interobserver variability are important strengths of cardiac magnetic resonance (CMR). In clinical practice a significant learning curve may however be observed. Basic CMR courses offer an average of 1.4 h dedicated to lecturing and demonstrating left ventricular (LV) function analysis. The purpose of this study was to evaluate the effect of in...
متن کاملTraining Quantized Nets: A Deeper Understanding
Currently, deep neural networks are deployed on low-power portable devices by first training a full-precision model using powerful hardware, and then deriving a corresponding lowprecision model for efficient inference on such systems. However, training models directly with coarsely quantized weights is a key step towards learning on embedded platforms that have limited computing resources, memo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018